Revealing the spatiotemporal requirements for accurate subject identification with resting-state functional connectivity: a simultaneous fNIRS-fMRI study

Abstract. Significance Brain fingerprinting refers to identifying participants based on their functional patterns. Despite its success with functional magnetic resonance imaging (fMRI), brain fingerprinting with functional near-infrared spectroscopy (fNIRS) still lacks adequate validation. Aim We investigated how fNIRS-specific acquisition features (limited spatial information and nonneural contributions) influence resting-state functional connectivity (rsFC) patterns at the intra-subject level and, therefore, brain fingerprinting. Approach We performed multiple simultaneous fNIRS and fMRI measurements in 29 healthy participants at rest. Data were preprocessed following the best practices, including the removal of motion artifacts and global physiology. The rsFC maps were extracted with the Pearson correlation coefficient. Brain fingerprinting was tested with pairwise metrics and a simple linear classifier. Results Our results show that average classification accuracy with fNIRS ranges from 75% to 98%, depending on the number of runs and brain regions used for classification. Under the right conditions, brain fingerprinting with fNIRS is close to the 99.9% accuracy found with fMRI. Overall, the classification accuracy is more impacted by the number of runs and the spatial coverage than the choice of the classification algorithm. Conclusions This work provides evidence that brain fingerprinting with fNIRS is robust and reliable for extracting unique individual features at the intra-subject level once relevant spatiotemporal constraints are correctly employed.

spectroscopy (fNIRS) share similar hemodynamic origins and reveal brain function by relying on the neurovascular coupling to infer neural activity and extract functional connectivity (FC) maps. [1][2][3][4][5] However, the low signal-to-noise ratio (SNR) intrinsic to both modalities ultimately leads to a lack of intra-subject reproducibility, which is enlarged by several confounding factors, including motion artifacts and physiology from noncortical tissues. [6][7][8][9][10][11] To overcome this limitation, researchers have often opted to perform group analysis by gathering data from different subjects based on their shared features. Although this approach increases the overall SNR and eases group comparisons, collapsing data from participants ignores unique individual features that may hold great potential for subject discrimination-a relevant feature that is valuable for several purposes, including clinical applications in which individualized treatments could benefit patient recovery and outcome. [12][13][14] The individual analysis ultimately depends on data reproducibility, which can be quantitatively assessed with test-retest protocols (i.e., experimental designs in which data are acquired from the same participant several times within days or a few weeks). To this end, several test-retest protocols have been previously applied to characterize the variability of fMRI and fNIRS data. 10,12,[15][16][17][18][19][20][21][22] Because healthy brain function is not expected to change significantly at the macroscale within a few days, repeated runs/sessions can help to develop optimal approaches for data acquisition (such as fMRI sequences and fNIRS probe positioning) and for analysis pipelines to yield more reproducible results at the intra-subject level. 9,10,[22][23][24][25][26] Thanks to these efforts, recent methodologies have increased the intra-subject reproducibility to a level that it is possible to accurately identify individuals based on their FC maps, i.e., brain fingerprinting.
Since the pioneering work by Finn et al., novel approaches have improved brain fingerprinting with FC patterns measured with BOLD-fMRI. 12 For example, Amico and Goñi explored principal component analysis (PCA) for assessing and optimizing individual identification and showed that unique features at the intra-subject level could be better reconstructed in the connectivity domain. 14 More recently, Venkatesh et al. demonstrated that the geodesic distance between a pair of correlation matrices yields higher individual classification rates than simply employing the Pearson correlation coefficient to compute the similarity across FC maps from different sessions. 27 The geodesic distance outperforms linear metrics because it does not neglect that correlation matrices lie in a nonlinear space. 27,28 Brain fingerprinting with fNIRS consolidates the feasibility of extracting robust and reliable individual neural features from fNIRS-based hemodynamics, paving the way for continuous bedside clinical works focused on single-patient changes in the short term. 29,30 A better understanding of what drives subject identification could allow us to differentiate between participantand session-dependent brain connectivity patterns, which is critical for isolating longitudinal brain changes from spurious fluctuations of the measured signal. Despite its potential, only a few works have attempted to perform brain fingerprinting with fNIRS to date, 31,32 which is probably related to the fact that fMRI methods are not directly translated to fNIRS due to the intrinsic noise properties of the latter. Specifically, fNIRS' high temporal resolution allows for the removal of systemic physiological noise, and it samples the brain faster than the hemodynamic changes and leads to high temporal autocorrelation in the hemoglobin time series. 33,34 The autocorrelation violates the statistical assumptions for the Pearson correlation and inflates the correlation coefficients, resulting in spurious FC maps. 35 In addition, fNIRS signals are contaminated by extracerebral hemodynamics and motion artifacts, 20,36-45 and both confounding factors can profoundly affect the comparison between hemoglobin time series and decrease sensitivity to actual cortical connectivity patterns. 20,22,35 Recent advances in fNIRS processing and analysis have introduced and validated robust methodologies for each of these challenges, opening new doors to properly translating brain fingerprinting protocols to fNIRS. 9,45,46 However, several experimental and methodological concerns-such as the number and location of the optodes, the amount of data, and the classification method-remain unresolved and could be addressed by a robust methodology for identifying subjects.
In the present work, we aimed to analyze the constraints and limitations associated with the problem of brain fingerprinting. To this end, we acquired resting-state fMRI and fNIRS data simultaneously over several runs. For the fNIRS data, we applied a previously developed and validated preprocessing pipeline, which included motion artifact correction and global systemic physiology removal. 45 We then quantified how subject identification accuracy was affected by the number of runs, the quantity of information provided by the different neuroimaging techniques, and the methodology used to compare the FC maps. Based on the hypothesis that data-driven techniques would benefit more from using different runs from the same individual than similarity metrics, we compared the performance of a linear classifier with standard matrix distance approaches. For fNIRS in particular, we also explored how the combination of different contrasts (i.e., types of hemoglobin) contributes to the problem.

Subjects and Experimental Protocol
A total of 29 healthy subjects (26 males, 18 to 34 years old) contributed to this study. All measurements were performed inside the MRI unit. We instructed all participants to close their eyes, stay relaxed, not move, and not focus their thoughts on any specific task. In each participant, we collected six runs of MRI and fNIRS simultaneously for six minutes each. Because MRI requires one structural image (high-resolution, T1-weighted) run, each participant contributed five and six functional measurements of BOLD-fMRI and fNIRS, respectively. The local ethical committee at the University of Campinas (where the experiments were performed) approved the experimental protocol. Participants provided written informed consent before data acquisition.

fMRI Signal Acquisition and Preprocessing
The MRI data were collected on a Philips Achieva 3 T scan with a 32-channel head coil. We acquired one T1-weighted structural image [Repetition Time (TR) = 7 ms, Time to Echo (TE) = 3.2 s] and five functional resting-state scans (180 volumes, TR = 2 s, TI = 900 ms). The BOLD data were preprocessed with the SPM12 and the UF2C toolbox. 47 We employed a preprocessing pipeline that included normalization, motion artifact correction (framewise displacement (FD) and DVARS), band-pass filtering between 0.009 and 0.08 Hz, stopband attenuation of 50 dB, regressions of white matter, cerebral spinal fluid, and global signal regression. 7,23,26,[48][49][50] The FD and DVARS thresholds were 0.5 mm and 5%, respectively. 23 All runs with less than 4 min without motion artifacts were excluded; 23,26,51 this procedure removed a total of 22 runs across all 29 participants. After the removal of runs, eight participants had less than three good BOLD-fMRI runs and were excluded from further analysis. For fMRI comparison with fNIRS, we parceled the brain cortex into 94 regions of interest (ROIs) using the anatomical automatic labeling approach. 52 The time series of each ROI was estimated as the average of all voxels within the ROI.

fNIRS Signal Acquisition and Preprocessing
All fNIRS measurements were performed with a commercial continuous-wave NIR system (NIRScout, NIRx Medical Systems). We designed the optical probe with 16 light sources [light-emitting diodes (LEDs) centered at 760 and 850 nm] and 32 detectors, allowing 64 channels (i.e., source-detector pairs) with distances ranging from 2.8 to 3.5 cm and a temporal resolution of 7.8 Hz. Using a 10-20 standard cap, 53 the optical probe was placed on the head, covering most of the cortical regions on the temporal, parietal, frontal, and occipital lobes. We used a commercial magnetic motion tracking sensor (Fastrack, Polhemus, Colchester, Vermont) to digitize the location of the optodes on the head and then co-register them on the Colin27 model with the AtlasViewer Matlab package 54 (Fig. 1). In addition to the optodes, five anatomical landmarks (Nz, Cz, Iz, A1, and A2 in the 10 to 20 system) were also recorded as AtlasViewer requires them to determine the affine transformation from the atlas space to the digitized space of the subject.
For preprocessing the fNIRS data, we used in-house MatLab scripts based on existing HomER 2 functions. 55 First, we pruned channels with a low SNR (SNR < 8; SNR is defined as the mean divided by the standard deviation) and excluded individual runs that did not have at least 50% good channels. Four participants did not have at least four good runs after our pruning criteria and were excluded from further analysis. Finally, we removed consistent bad channels (>70% across all participants) to keep the same number of channels across all subjects. This procedure excluded 16 channels, leading to a probe with 48 remaining channels for further analysis.
For the good-quality channels that survived the quality criteria described above, we converted the measured light intensity to optical density and corrected motion artifacts with a hybrid algorithm that combines spline interpolation with wavelet decomposition, in this order, for properly removing baseline changes and spikes. 9 With the corrected optical density, we estimated hemoglobin concentration changes using the modified Beer-Lambert law with differential pathlength factor (DPF) equal to 6 for both wavelengths. 56,57 The hemoglobin time courses were band-pass filtered between 0.009 and 0.08 Hz to remove high-physiological noise and low-frequency drifts. 20,22,48,49 Because our goal was to investigate the constraints of subject identification based on neural hemodynamics, we applied a PCA filter to account for the effects of global systemic physiology in fNIRS measurements by removing the first principal component for oxy-hemoglobin (HbO) and deoxy-hemoglobin (HbR). 45,46,50

Resting-State Functional Connectivity Maps
The last preprocessing step removed the temporal autocorrelation from the fNIRS and fMRI time series with an autoregressive model (pre-whitening) of order P to reduce inflated correlations across the brain. 34,35,58,59 The order of the model was estimated automatically with Bayesian information criteria and independently for each time series of every fNIRS channel or BOLD ROI. We verified that the prewhitening step did not remove the anticorrelation between HbO and HbR ( Fig. S1 in the Supplementary Material). For fNIRS, we estimated the total hemoglobin concentration (HbT) changes as the sum of HbO and HbR. Finally, the Pearson correlation coefficients across ROIs (fMRI) and channels (fNIRS) were computed to provide FC matrices for each subject run and brain signal separately (BOLD, HbO, HbR, or HbT).

Subject Identification
The FC matrices (i.e., correlation matrices) were used to identify participants by comparing a subset of the data (known as the testing dataset) with the remaining independently collected runs (referred to as the training dataset). Here, the testing dataset contained one FC matrix per participant (leave-one-out approach), and the training dataset had a variable number of FC matrices with at least one run per participant. We opted to vary the number of samples per subject in the training dataset to investigate how this variation would affect brain fingerprinting's classification. We evaluated the subject identification performance with either pairwise metrics (see details in Secs. 2.5.1 and 2.5.2) or a linear classifier (see Sec. 2.5.3). For each investigated scenario, the testing and training runs were chosen randomly and independently 300 times. In all cases, the percentage accuracy was defined as E Q -T A R G E T ; t e m p : i n t r a l i n k -; s e c 2 . 5 ; 1 1 6 ; 6 6 3 Accuracyð%Þ ¼ 100 × Number of corrected labels Number of attempts : We used the two-sided Wilcoxon rank-sum test to compare accuracy results obtained under different (independent) scenarios because the distributions in scenarios that yield high accuracy would not be normally distributed. The sample size in all cases was 300 data points. Throughout the results, we provide the Wilcoxon rank sum score for each statistical test and the z-score for computing the approximate p-value of the test. The effect size (estimated with Cohen's D method) and the p-value adjusted for multiple comparisons (derived from the Bonferroni correction) are also included.

Pearson correlation
In this procedure, subject identification was performed with the Pearson correlation coefficient. This approach, widely used in the literature, assumes that FC matrices from the same participant should have higher correlations among each other than across participants. To this end, we rearranged the upper diagonal elements of each FC matrix (because the FC matrices are symmetric) into a vector and then computed the Pearson correlation coefficients among all possible pairs of testing and training vectors. The participant with the FC matrix that yields the highest correlation with the FC matrix of the testing dataset was assigned as the identified subject. 12,27,31

Geodesic distance
The geodesic distance extends linear metrics, such as the Euclidian distance and the Pearson correlation, by considering that the correlation matrices lie in a nonlinear space. 27,28 The geodesic distance between two arbitrary FC matrices (C 1 and C 2 ) was calculated using an affine-invariant Riemannian metric given as E Q -T A R G E T ; t e m p : i n t r a l i n k -; e 0 0 1 ; 1 1 6 ; 3 1 7 where the log represents the logarithmic matrix operator. To compute the geodesic distance, we added the identity matrix, I, to the FC matrices (i.e., C i → C i þ I) to guarantee that all eigenvalues are greater or equal to zero and to avoid convergence problems when inverting the correlation matrices. 27 With this metric, one assumes that FC matrices from the same participant should have smaller distances to each other than across participants.

Linear classifier
We also examined the performance of a data-driven method for the problem of subject identification. Here, we chose a simple linear classifier without a mapping function (i.e.,ŷðw; xÞ ¼ wx, where w and x are the matrix of weights and the attribute vector with incorporated bias, respectively) due to its simplicity. 13 In this approach, we vectorized the FC matrices as described above and optimized the linear classifier with the training dataset via minimization of the sum of the squares of the residuals with a penalization on the size of the coefficients (ridge regression). The output of the linear classifier was an N-dimensional vector (ŷ), in which N is the number of participants in the cohort. The j'th element of the output vector,ŷ j , represents the probability of a given unknown vector from the testing dataset to be identified as subject j. Therefore, the output element with the highest value defined the subject label.

Number of Runs Available for Training Affects Classification Accuracy
First, we investigated how the classification accuracy increases with the number of available runs for training (Fig. 2). For the linear classifier, the average performance increased from 75% to 80% accuracy (when only one run was used for training) to 97% to 99.9% (when we used one run for testing and the remaining runs for training), depending on the contrast used. (Note that, due to the pruning criteria, the number of runs per participant varied from three to five for BOLD and four to six for fNIRS; see Secs. 2.2 and 2.3.) This behavior was also true for the Pearson and geodesic cases, but only when all of the available runs were used for training. In fact, we did not observe relevant differences in accuracy when one or two runs were used for training with these two approaches (Pearson and geodesic), independently of the contrast. As expected, these results illustrate that a minimum amount of temporal information is required to achieve high accuracy; the threshold depends on the contrast provided by the neuroimaging technique and the algorithm used to compare the FC matrices. Overall, at least two 6-min runs per subject in the training dataset appear to be essential for any brain fingerprinting method for fNIRS and BOLD signals.
Concerning the different classification algorithms, the Pearson correlation approach yielded the worst accuracy in all investigated scenarios. Average accuracies barely reached 90%, even when all runs were used for classification. When only one run was available for training, we found that the geodesic distance significantly outperformed the linear classifier by an average (standard deviation) of 11 (6)% for the BOLD, HbO, and HbT contrasts [corrected p-value < 0.001 in all cases; BOLD: z-score = 18.35, rank sum score = 128,651, and effect size = 2.05 (1.86 to 2.25); HbO: z-score = 5.53, rank sum score ¼ 10 5 , and effect size = 0.5 (0.34 to 0.66); and HbT: z-score = 15.4, rank sum score = 122,433, and effect size = 1.5 (1.3 to 1.7)]. The only exception was HbR, in which the geodesic distance and the linear classifier presented similar results with a slight advantage to the latter, although not statistically significant [uncorrected p-value = 0.61, z-score = −0.5, rank sum score = 89,084, and effect size = −0.07 (−0.23 to 0.08)].
This situation changed as soon as more information was added to the training set: the linear classifier presented the best performance when more than one run was available for training. The geodesic distance did not change after the addition of a second run, and the average (standard deviation) accuracies of the linear classifier achieved accuracies as high as 95.6 (4.4)%, 90.8 (5.0)%, 90.8 (5.6)%, and 92.3 (5.1)% for the BOLD, HbO, HbR, and HbT signals. On average, this performance was 10% higher than the geodesic distance when the same scenario was considered [corrected p-value < 0.001; BOLD: z-score = 3.07, rank sum score ¼ 9.6 × 10 4 , and effect size = 0.29 (0.13 to 0.45); HbO: z-score = 16.5, rank sum score ¼ 1.2 × 10 5 , and effect size = 1.73 (1.55 to 1.92); HbR: z-score = 18.28, rank sum score ¼ 1.28 × 10 5 , and effect size = 1.2 (1.8 to 2.2); ad HbT: z-score = 9.37, rank sum score ¼ 1.1 × 10 5 , and effect size = 0.83 (0.66 to 1.00)]. Together, these results show that the geodesic distance greatly benefits from the nonlinearity when little information is available. However, the heterogeneity captured by adding more independent samples compensates for the nonlinear space if they are all used as different inputs in machine learning algorithms.
Concerning the type of contrast, BOLD provided the highest accuracies compared with any fNIRS signal, and the difference was more remarkable when little information was available (i.e., one run for training). In this scenario, BOLD average accuracy was 1.20, 1.27, and 1.08 times greater than HbO, HbR, and HbT, respectively. In fact, BOLD was the only contrast to achieve 90% accuracy when only one run was available for training. Considering the best approach per training condition and brain signal, we observed that the BOLD classifications were still the highest. However, the discrepancy in accuracy between BOLD and fNIRS decreased as the number of runs used for training increased. In the condition with the most runs available for training, the average accuracy of BOLD/fNIRS was 1.09 (HbO), 1.12 (HbR), and 1.05 (HbT).

Amount of Spatial Information Influences Performance on Subject Identification
The minimum amount of temporal information for achieving high accuracy may vary with the spatial information available for analysis. In fact, the greater spatial heterogeneity captured by the higher number of ROIs in BOLD (94 regions, creating a 94 × 94 FC matrix) appears to be the main reason for its higher accuracy when compared with fNIRS (48 channels, creating a 48 × 48 FC matrix).
To investigate whether the larger spatial information drove the higher BOLD accuracy, we evaluated the classification performances as a function of the number of available ROIs. For each modality, we selected random sub-networks with 10, 20, 30, 40, and 48 ROIs and performed the classifications with one run for training and one for testing. The chosen ROIs were randomly selected and repeated ten times (except for the case of using all ROIs), avoiding discrepant accuracy values due to a given joint of ROIs. For each subgroup of ROIs, the set of runs for training and testing was randomly selected and repeated 300 times, as done in the other comparisons. We focused on the geodesic distance because it provided the highest performance when only one run was available for training. Figure 3 shows how the average accuracy increases with the number of ROIs used as inputs in the classification, confirming that a broader coverage of independent ROIs contributes to extracting unique and reliable connectivity patterns for subject identification. ( Figure S2 in the Supplementary Material depicts the whole distribution.) Interestingly, the BOLD-based accuracy did not differ from the HbT accuracy when the two techniques matched the same number of ROIs. This result suggests that the higher performance from BOLD in Fig. 2 likely comes from the higher number of ROIs sampled.
Unlike fMRI, the brain regions measured with fNIRS depend on the instrument setupspecifically, the number of sources and detectors and the density of channels created for a given region. (Note that the number of ROIs may not be directly associated with the number of channels. For example, high-density fNIRS probes with all channels placed locally sample a small ROI, and the information provided by this probe will have a high covariance across channels.) Therefore, the spatial information is highly variable across different fNIRS studies. We attempted to estimate the relationship between the classification accuracy and the number of independent regions measured. Specifically, we approximated the average accuracy to an empirical model that resembles the behavior observed for all contrasts in Fig. 3, i.e., Accuracyð%Þ ¼ α × ð1 − e −γðNumber of ROIsÞ Þ. Table 1 shows the estimated parameters and the chi-square (χ 2 ) goodness-of-fit test (under the null hypothesis that the data follows the specified model). One interesting prediction from the model is that, although HbO and HbT are typically reported in resting-state fNIRS studies, the information contained in these contrasts is limited, with a predicted saturation (estimated by α in our model) of 88% and 98%, respectively. In other words, our model suggests that these contrasts would not provide a target of 100% accuracy with just one single run for training, even in the limit of hundreds of fNIRS channels.
By taking the derivative of the above expression for accuracy with respect to the number of ROIs, we can also estimate the improvement rate in the classification accuracy (AR) as a function of the spatial information. As the accuracy increases with the number of ROIs, the improvement rate will decrease exponentially by a factor γ in our model. Table 1 shows two relevant AR thresholds: 0.50% (AR 0.5% ) and 0.25% (AR 0.25% ). The number of ROIs in which Table 1 Estimates for the data shown in Fig. 3 with the model predicting the accuracy as a function of the number of ROIs. Chi-square estimate (χ 2 ) shows the goodness-of-fit under the null hypothesis that the data follows the specified model. In our problem, the critical χ 2 to reach a significance level of 0.01 must be less than or equal to 0.115 (3 degrees of freedom). The scores AR 0.5% ∕AR 0.25% represent the threshold at which adding an extra ROI would increase the accuracy by <0.5%∕0.25%, respectively.  the improvement in the accuracy rate is lower than 0.50% (AR 0.5% ) can be interpreted as the minimum number of channels for achieving good classification (because there would be a huge margin for significant improvement in accuracy with fewer channels), whereas AR 0.25% can be seen as a "saturating" point for accuracy, i.e., at this point, increasing the number of channels does not considerably increase the accuracy performance. With our data, the number of necessary ROIs to achieve AR 0.5% and AR 0.25% was the highest for HbR when compared with HbO and HbT. This result suggests that the HbR-based resting-state FC (rsFC) needs a broader probe coverage to extract unique individual features to identify participants with the same precision as HbO, HbT, and BOLD. In addition, the model estimates for HbT are comparable to BOLD, reinforcing that this fNIRS parameter could be more suitable to replicating BOLD-based resting-state FC results, as empirically suggested by previous studies. 20,22,45,60,61

HbR Provides Additional Complementary Information to HbT FC Maps
Finally, we explored how combining different fNIRS chromophores would benefit subject identification (Fig. 4). For the Pearson correlation and linear classifier approaches, we combined different hemoglobin information by concatenating their vectorized correlation matrices. For the geodesic distance, we concatenated the FC matrices through the main diagonal to obtain block-diagonal matrices and preserve the spatial properties of the original FC matrices. 28 For this investigation, we used only one run for training and one run for testing because our goal was to compare the combination of contrasts with the effects of having more runs for training.
Overall, combining fNIRS contrasts improves subject identification, suggesting that the different hemoglobin contrasts carry complementary information for extracting connectivity patterns at the individual level. However, it does not compensate for more data points (i.e., having more runs for training is still better than just combining fNIRS data). Figure 4 shows the average accuracy for all possible hemoglobin combinations (see Fig. S3 in the Supplementary Material for the confusion matrices). Concerning all different possibilities, the combination of HbR and HbT provided the highest performance, with an average (standard deviation) accuracy of 90.2 (5.2)%-which is comparable to the 92.3 (5.1)% average accuracy that we obtained for HbT-only with two runs for training. Interestingly, adding HbO to the HbR+HbT combination did not increase accuracy. Concerning the different classification strategies, the geodesic distance again outperformed the other approaches in all cases.

Discussion
This work focuses on identifying subjects based on their neural patterns as measured by rsFC, so called brain fingerprinting. Although brain fingerprinting has not gained practical application, it illustrates the value of improving data acquisition and analysis methods to enhance intrasubject reproducibility with neuroimaging techniques. We believe that the search for individual features should be further explored in functional neuroimaging despite the scientific challenges because accurate functional cortical information holds the potential to guide treatment or rehabilitation decisions, favoring patient recovery and outcome in the short term. 30 As subject identification is likely to be more dependent on acquisition conditions than analysis pipelines, we employed a validated analysis pipeline that was previously demonstrated to reproduce resting-state networks commonly reported in the literature 45 and examined how spatiotemporal properties affect reproducibility and, thus, brain fingerprinting.
It is quite straightforward-and even intuitive-that data acquisition conditions will influence the ability to identify subjects. For each subject, it is ideal to collect several runs from many different ROIs for as long as possible with a variety of techniques to obtain as much information as possible. However, this approach is neither feasible nor scalable in practice. The fNIRS systems have a limited number of channels to probe different ROIs that depend on the number of sources and detectors and how dense the coverage of a single region should be. In addition, collecting long datasets from multiple runs or sessions is time-consuming, demands a high operational cost, and is restricted to specific populations. Hence, real datasets have tradeoffs, and the present work investigates relevant acquisition constraints and limitations associated with the different choices that must be made for solving the problem of brain fingerprinting using resting-state functional measurements.
Overall, our results show that FC maps obtained at rest carry sufficient information to identify participants with near-perfect accuracy. This finding indicates that either fNIRS or fMRI can isolate the uniqueness of individual brains despite the presence of robust inter-subject resting-state networks. 45,62,63 Our ability to identify distinctive and reliable brain features across such a homogenous group of mainly young males (26 out of 29), which presents a significant challenge for the subject classification methodologies, needs to be highlighted. In samples with an even sex balance, sex-related differences could facilitate subject identification because FC is known to differ according to sex. 64 Previous fMRI studies have introduced different approaches for performing subject identification, primarily based on pairwise distance metrics, such as the Pearson correlation and geodesic distance. 27 In this vein, our results reinforce the better performance of the geodesic distance over the Pearson correlation. We additionally explored how standard machine learning algorithms contribute to brain fingerprinting. Here, we chose to use a linear classifier without a mapping function. Our results show that this algorithm was sufficient for capturing the individual variability when enough data were available, suggesting that the choice of data-driven algorithms is not the main factor that drives this problem; consequently, using more computationally intensive algorithms does not add any significant contributions to brain fingerprinting under these conditions. In fact, we tested the performance of a standard multilayer perceptron (MLP) artificial neural network in our data under different conditions (results not shown). Specifically, we tested several MLP architectures by varying the number of hidden layers (1, 2, and 3) and the number of neurons per hidden layer. In all investigated scenarios, the MLP did not yield higher accuracies than the linear classifier. From the computational perspective, the better performance of the linear classifier is likely related to the relatively low amount of data available per subject (in this study, five runs for training in the best case).
However, very few experimental protocols acquire more temporal data than ours; therefore, the main advantages of conventional machine learning approaches will be limited in functional neuroimaging data given the current experimental protocols (although they would truly benefit from larger databases, such as the human connectome project available for fMRI, 65 for such datasets are still not available for fNIRS). The use of few-shot learning algorithms, in which a machine learning model learns from only one or a few samples, would probably be more beneficial to fNIRS than conventional deep-learning neural networks. 66 In addition, another promising alternative would be to add a kernel to the linear classifier or employ a nearest neighbors' algorithm (NNA). A proper kernelization strategy could improve the performance of the linear classifier to excel the geodesic distance, and the NNA should perform at least better than the Pearson correlation. Meanwhile, acquiring more data from broader ROIs would bring more value to the goal of reducing intra-subject variability for single-subject analysis.
A relevant methodological difference between pairwise metrics and machine learning approaches is how they use the temporal information available. When more than one run is available, the correlation matrices are typically averaged across runs in the Pearson and geodesic distance calculation. 12,27 For the linear classifier, each FC matrix is used as an independent sample for each participant. 13 Using every run as a sample will be more robust than averages because it will not average out dynamic features that are not present in every run but still characterize each participant. [67][68][69] For this reason, the linear approach outperformed the other methods for the fNIRS signals when more than one run was available for training (Fig. 2). The difference between the linear classifier and the geodesic distance is not as evident for fMRI as for the fNIRS because the correct computation of the pairwise distance in the nonlinear correlation space across a broad spatial dataset compensates for the lost temporal information. Thus, it is possible that the geodesic distance could have a similar performance to the linear classifier when used on broad fNIRS probes that cover independent regions, as suggested by the results presented in Fig. 3. In the case in which more spatial information is available, the geodesic distance would probably be preferred over machine learning techniques due to its simplicity and low computational cost.
The number of covered ROIs (not necessarily the number of fNIRS channels or fMRI voxels) is a crucial factor in extracting specific and meaningful information to identify individuals. The behavior seen in Fig. 3 indicates that measuring more brain regions can uncover unique connectivity patterns and favor subject identification. Because fMRI experiments naturally probe the entire brain cortex, this finding is more relevant to the fNIRS field; it implies that, although challenging, whole-head fNIRS measurements are recommended for individual monitoring and are preferred over dense local fNIRS measurements if sufficient sources and detectors are not available.
In terms of identifying subjects, it would be interesting to isolate which brain areas, if any, are most relevant in the future. Although we did not isolate which subnetworks could play a more important role to subject differentiation, it has been argued that frontoparietal networks are crucial to individual uniqueness. 12 Frontoparietal networks are more linked to higher-order cognitive processing than primary sensory functions; thereby, it is not unreasonable to expect that these networks may encode the uniqueness of each brain. Interestingly, frontoparietal networks have also demonstrated great value in the clinic at the individual level for quantifying residual cognitive function in patients with disorders of consciousness. 70,71 In the future, combining the knowledge of the relevant brain regions with real-time neuronavigation for accurate fNIRS placement of sparse optical probes could overcome constraints due to a limited number of optodes and avoid unnecessary broad optical designs that are often time-consuming to set up. 10,24 Among all fNIRS contrasts, HbT provided the highest classification accuracies, which is consistent with previous studies that found HbT to be the most reliable feature for rsFC. 20,22,45,60 By combining HbO and HbR, HbT captures information from both. However, one might hypothesize that combining different fNIRS contrasts would still yield complementary information due to differences in HbO, HbR, and HbT dynamics during brain activity. 5,38,[72][73][74] In fact, adding HbR to HbT resulted in improved classification, but adding HbO to the combination of HbR and HbT made no difference in average accuracy (Fig. 4). As HbO fluctuations are higher than HbR fluctuations, they drive HbT fluctuations, which may explain why HbO does not increase accuracy in the presence of HbT. In contrast, HbR contributes little to HbT, so combining the two will give HbR more weight in the classification problem, which can help identify subjects.
We found, however, that the combination of fNIRS contrasts only showed modest improvements in classification accuracy overall and did not compensate for the acquisition of multiple runs from the same subject. In addition, the combined data did not reach accuracies close to the ones obtained with the BOLD signal, indicating that the spatial information is also more relevant than the complementary dynamics of HbO, HbR, and HbT. The low improvement from the combination of fNIRS contrasts is probably related to the high cross-correlation among HbO, HbR, and HbT; note that, for neural hemodynamics, HbO and HbR measured in the same channel are expected to be anticorrelated. Of relevant note on this topic, in this work, we combined the FC matrices from each hemoglobin by concatenating their correlation values. An alternative approach would be to treat each fNIRS signal as independent input and then compute the Pearson correlation coefficient across them. Therefore, we would have access to the crosscorrelations between HbO, HbR, and HbT instead of only their concatenated correlation values. This approach, combined with the geodesic distance, could potentially increase the observed rates.
For the sole purpose of subject identification, it is worth noting that the fNIRS sensitivity to extracerebral hemodynamics and systemic physiology could also be used as an additional feature (although it would not constitute brain fingerprinting). In fact, for this particular study, in which data were obtained within a few minutes of each other during the same session, the additional information provided by global hemodynamics (which we removed with PCA) could facilitate the identification of subjects from fNIRS data and increase the classification accuracies reported above. However, systemic hemodynamics inherent in fNIRS data can be difficult to generalize to a more global context. Using only systemic information might not be beneficial to subject identification because global systemic hemodynamics are governed by feedback and feedforward mechanisms of the nervous system as a response to the body's needs. For a given subject, the systemic pattern would likely be more variable than the neural pattern if data were acquired at different sessions.

Conclusion
In summary, we investigated the problem of subject identification with resting-state brain FC through simultaneous measurements with fMRI and fNIRS data from 29 healthy and young subjects. To our knowledge, no previous work has performed a comprehensive study of the feasibility of using fNIRS for identifying participants during the resting state with fMRI validation. Considering the results obtained with our dataset, we suggest acquiring at least four runs per subject (three for training) to achieve close to 100% accuracy. For the fNIRS measurements, it is important to have a wide coverage with ∼50 channels (ROIs). In addition, the combination of HbR and HbT provides higher classification performance than each hemoglobin separately. We highlight that correctly classifying subjects in such a homogenous group as ours shows the tremendous potential of fMRI and fNIRS for intra-subject analysis, particularly with rsFC.

Disclosures
The authors have no conflicts of interest to declare.